# install.packages("tidyverse")
# install.packages(c("forecast", "quantmod", "crypto2", "RcppRoll", "rugarch", "fGarch"))
# install.package(c("fable", "tsibble", "feasts", "keras", "highcharter", "plotly"))Time-series & cryptocurrencies
WARNING: this notebook relies on several R packages. Whenever one is called (for instance via a call such as “library(torch)”), this implies that the package has been installed, e.g., via “install.packages(”torch”)”. Hence, make sure the following packages are installed!
library(tidyverse)Crash course on time-series
Definitions & notations
Time-series are random variables indexed by time. Usually, they are written \(x_t\), or \(X_t\) or \(y_t\) or similar notations of this kind. The index \(t\) refers to time. The essential question in time-series is to determine the law of motion of the process, as well as its characteristics.
Most often, the process integrates some form of randomness, which is usually written \(e_t\), \(\epsilon_t\) or \(\varepsilon_t\). There are two common simplifying assumptions that are made with respect to these quantities, which are sometimes referred to as innovations or residuals:
- they can be normally distribution (with Gaussian law);
- they are independent from each-other.
NOTE: in almost all models, innovations have zero means, \(\mathbb{E}[e_t]=0\), hence in the case of Gaussian innovations, only the variance \(\sigma^2\) is specified. We will work under these (standard) assumptions below. Let’s generate a series of \(e_t\) and plot it below.
N <- 120
sigma <- 0.2
x <- rnorm(N, mean = 0, sd = sigma)
data.frame(t = 1:N, x = x) |>
ggplot(aes(x = t, y = x)) + geom_line() + theme_minimal()mean(x)[1] -0.03532763
sd(x)[1] 0.2185235
Examples
The random walk
The simplest case is the white noise: \[x_t= e_t,\]
but this is not very interesting. A second important case is linked to the notion of random walk: \[x_t=x_{t-1}+e_t,\] and, iterating, we see that we have \(x_t=\sum_{s=0}^t e_s\) - where we assume a starting point at \(t=0\). Notably, we have \[\mathbb{E}[x_t]=\mathbb{E}\left[\sum_{s=0}^t e_s\right]=\sum_{s=0}^t\mathbb{E}\left[ e_s\right]=0.\]
Moreover, if the innovations are independent,
\[\mathbb{V}[x_t]=\mathbb{V}\left[\sum_{s=0}^t e_s\right]=\sum_{s=0}^t\mathbb{V}\left[ e_s\right]=t\sigma^2.\]
Hence, the variance increases with time - linearly. If perturbations are not independent, then it’s a bit more complicated and the variance will depend on all auto-covariances.
The covariance between two points in time is given by (number of overlapping times):
\[\mathbb{C}ov[x_t,x_{t+h}] = \mathbb{E}\left[\sum_{s=0}^t\sum_{r=0}^{t+h} e_se_r\right]=t\sigma^2, \quad h \ge 0.\] This is because all innovation terms are independent, hence the only ones that are non-null are the \(t\) terms of the form \(\mathbb{C}ov[x_s,x_s]=\sigma^2\).
Hence, we see that both the variance and auto-covariance depend on \(t\), say, the present time. This makes analysis complicated, because, depending on the moment we consider, the distribution of \(x_t\) will change.
Let’s simulate one trajectory.
#set.seed(42) # This freezes the random number generator
x <- cumsum(rnorm(N, mean = 0, sd = sigma))
tibble(t = 1:N, x = x) |>
ggplot(aes(x = t, y = x)) + geom_line() + theme_minimal()The auto-regressive model
Let us introduce a small modification to the above process and consider: \[x_t=a+\rho x_{t-1}+e_t, \quad a \in \mathbb{R}, \quad |\rho|< 1.\] This process is called the Auto-Regressive model of order one and is written AR(1). Iterating, we have: \[x_t = a + \rho (a + \rho x_{t-2}+e_{t-1})+e_{t}\] and, until infinity… \[x_t=a \sum_{k=0}^\infty\rho^k+ \sum_{k=0}^\infty \rho^ke_{t-k}=\frac{a}{1-\rho}+ \sum_{k=0}^\infty \rho^ke_{t-k},\] which explains why we need \(|\rho|<1\).
Again, we have
\[\mathbb{E}[x_t]=\frac{a}{1-\rho}, \] and when innovations are independent, \[\mathbb{V}[x_t]=\mathbb{V}\left[\sum_{k=0}^\infty \rho^ke_{t-k} \right]=\sum_{k=0}^\infty \mathbb{V}\left[\rho^ke_{t-k} \right]=\frac{\sigma^2}{1-\rho^2}.\]
In this case, we see that the mean and variance do not depend on \(t\)! (obvious given the infinite series)
Moreover,
\[\mathbb{C}ov[x_t,x_{t+h}]=\mathbb{C}ov\left[\sum_{k=0}^\infty \rho^ke_{t-k} ,\sum_{j=0}^\infty \rho^je_{t-j+h} \right]=\sum_{k=0}^\infty \sum_{j=0}^\infty \mathbb{C}ov\left[ \rho^ke_{t-k}, \rho^je_{t-j+h} \right]\] Again, we need to focus on the cases when the time indices are equal, so \[\mathbb{C}ov[x_t,x_{t+h}]=\sum_{k=0}^\infty \sum_{j=h}^\infty \mathbb{C}ov\left[ \rho^ke_{t-k}, \rho^{j-h}e_{t-j} \right]=\sum_{k=0}^\infty \sum_{j=h}^\infty \rho^{k+j-h}\mathbb{C}ov[e_{t-k},e_{t-j} ]\]
Thus, for \(j=k\), we get \[\mathbb{C}ov[x_t,x_{t+h}]=\sigma \sum_{j=h}^\infty \rho^{2j-h}=\sigma^2 \rho^h \sum_{j=0}^\infty \rho^{2j}=\sigma^2\frac{\rho^h}{1-\rho^2}.\] Again, this does not depend on the time index \(t\). If innovations are Gaussian, as Gaussian distributions are entirely defined by their first two moments, this means that the full (unconditional) distribution of the process is invariant. This very convenient property is called stationarity, and comes in various flavors (have a look on the net!). It is very suitable for analysis because now we know that the object that we study (the underlying distribution) is stable in time.
The above quantity, which measures the link between different realizations of the same variable (at different points in time) is very important. Often, the covariance is scaled by the variance to yield the auto-correlation of the process. Relatedly, there is the auto-correlogram, which is defined, for stationary series, as
\[\gamma(h)=\mathbb{C}orr[x_t,x_{t+h}].\] For the AR(1), \(\gamma(h)=\rho^h\).
More generally, autoregressive models of higher orders (AR(p)) have memory that goes back further in time: \[x_t=a+\sum_{k=1}^p\rho_k x_{t-k}+e_t, \quad a \in \mathbb{R},\]
Even more generally, there are ARMA(p,q) processes, defined as \[x_t=a+\sum_{k=1}^p\rho_k x_{t-k} +\sum_{j=1}^q\delta_je_{t-j}+e_t, \quad a \in \mathbb{R},\]
but more in these is left to your curiosity.
Let’s simulate two such processes. We refer to the ARIMA page for details on the implementation. The order is \((p,d,q)\), like in the ARMA(p,q) process (the \(d\) is for integration, a notion out of the scope of this course).
x_05 <- arima.sim(list(order = c(1,0,0), ar=.05), n = N)
x_95 <- arima.sim(list(order = c(1,0,0), ar=.95), n = N)
tibble(time = 1:N, x_05 = x_05, x_95 = x_95) |>
pivot_longer(-time, names_to = "process", values_to = "values") |>
ggplot(aes(x = time, y = values, color = process)) + geom_line() +
theme_bw() +
scale_color_manual(values = c("#DD9911", "#0066AA"))One of the processes seems to be just white noise, the other evolves more slowly and oscillates much less.
Let’s have a look at the auto-correlogram of the series. In R, ACF means AutoCorrelation Function. There is a function that shows it in the form of a gg-plot.
library(forecast) # For the ggAcf() function
ggAcf(x_95) + theme_bw() +
geom_function(fun = function(x) exp(x*log(0.95)), color = "#11DD33")The values we extract from the sample are not exactly those predicted from the model. This comes from size of the sample, which is too small. As it increases, we would obtain a perfect fit (you can try!).
ARCH models
In practice simple autoregressive models can be too limited because some variables are not exactly stationary. To illustrate this statement, let’s have a look at a volatility index, the VIX.
NOTE: usually, the volatility is the standard deviation of past returns. The VIX is slightly different, as it is based on forward-looking option prices. Still, it is linked (empirically) to the traditional volatility.
library(quantmod)
library(highcharter)
library(tidyverse)
min_date <- "1998-01-01" # Starting date
max_date <- "2025-11-07" # Ending date
getSymbols("^VIX", src = 'yahoo', # The data comes from Yahoo Finance
from = min_date, # Start date
to = max_date, # End date
auto.assign = TRUE,
warnings = FALSE)[1] "VIX"
VIX <- VIX |> as.data.frame() |>
rownames_to_column("date") |>
dplyr::mutate(date = as.Date(date))
head(VIX,7) # Have a look at the result!| date | VIX.Open | VIX.High | VIX.Low | VIX.Close | VIX.Volume | VIX.Adjusted |
|---|---|---|---|---|---|---|
| 1998-01-02 | 24.34 | 24.93 | 23.42 | 23.42 | 0 | 23.42 |
| 1998-01-05 | 24.11 | 25.02 | 23.02 | 24.36 | 0 | 24.36 |
| 1998-01-06 | 25.20 | 25.97 | 24.87 | 25.66 | 0 | 25.66 |
| 1998-01-07 | 26.15 | 27.43 | 25.07 | 25.07 | 0 | 25.07 |
| 1998-01-08 | 26.24 | 26.70 | 25.62 | 26.01 | 0 | 26.01 |
| 1998-01-09 | 25.79 | 29.35 | 25.69 | 28.69 | 0 | 28.69 |
| 1998-01-12 | 28.69 | 31.08 | 28.02 | 28.02 | 0 | 28.02 |
vix_chart <- VIX |> dplyr::select(VIX.Close)
rownames(vix_chart) <- VIX |> dplyr::pull(date)
highchart(type = "stock") %>%
hc_title(text = "Evolution of the VIX") %>%
hc_add_series(as.xts(vix_chart)) %>%
hc_tooltip(pointFormat = '{point.y:.3f}') Hence, we see that the series exhibits some moments of extreme activity and its distribution is not the same through time. This is referred to as “clusters of volatility”.
Therefore, we also need to introduce models that allow for this type of feature. In 1982, Robert Engle proposed such a model, called ARCH, which was generalized in 1986 to GARCH models. As we show below, there is a direct link with auto-regressive processes!
The idea is to work with the innovations, \(e_t\) and to specify them a bit differently, i.e., like this: \(e_t=\sigma_t z_t\), where \(\sigma_t\) is going to be a changing standard deviation and \(z_t\) is a simple white noise. What matters is that the vol term is modelled as \[\sigma^2_t = a+\sum_{j=1}^pd_j\sigma^2_{t-j} + \sum_{k=1}^qb_ke^2_{t-k},\] hence, the value of \(\sigma_t^2\) depends both on its past and on the past of squared innovations.
References
For time-series
There many resources available online. We single out two below:
For cryptocurrencies
Application to cryptocurrencies
For simplicity, we will work with daily data. Higher frequencies can be obtained outside the crypto space via the highfrequency package. Or playing with dedicated APIs (e.g. with packages: binance/coinmarketcap).
Fetching data
We will use the crypto2 package below. If need be: install it.
library(crypto2)First, let’s have a look at the list of available coins… which are numerous!
coins <- crypto_list() You can have a look at the info for the coins via the commented code below.
c_info <- crypto_info(coin_list = coins, limit = 30)❯ Scraping crypto info
❯ Processing crypto info
Next, we can download historical quotes.
coin_names <- c("Bitcoin", "Ethereum", "Tether USDt", "BNB", "USDC", "Solana")
coin_hist <- crypto_history(coins |> dplyr::filter(name %in% coin_names),
start_date = "20170101",
end_date = "20251106")❯ Scraping historical crypto data
❯ Processing historical crypto data
Coin USDC does not have data available! Cont to next coin.
Coin Solana does not have data available! Cont to next coin.
Coin Solana does not have data available! Cont to next coin.
coin_hist <- coin_hist |> # Timestamps are at midnight, hence we add a day.
dplyr::mutate(date = 1 + as.Date(as.POSIXct(timestamp, origin="1970-01-01")))And plot them. NOTE: mind the log-scale for the y-axis.
Also, we use plotly below for a better experience.
library(plotly)
g <- coin_hist |>
ggplot(aes(x = date, y = close, color = name, label = symbol)) +
geom_line() +
scale_y_log10() + theme_bw() + xlab("")
ggplotly(g)coin_hist |>
filter(symbol %in% c("USDT", "USDC")) |>
ggplot(aes(x = date, y = close, color = name, label = symbol)) +
geom_line() + theme_classic() +
theme(legend.position = c(0.8,0.8))Now, the problem with some of these series is that they are NOT stationary. At the beginning of the sample, their values are much lower to those at th end of the sample. This is one of the reasons why in finance, we look at returns: \[r_t=\frac{p_t-p_{t-1}}{p_{t-1}},\] which are simply relative price changes.
Let’s compute them below. We recall that prices are gathered in the close column.
coin_hist <- coin_hist |>
group_by(name) |>
mutate(return = close / lag(close) - 1 ) |>
ungroup()And have a look at their distributions.
coin_hist |>
ggplot(aes(x = return, fill = name)) + geom_histogram() +
facet_wrap(vars(name), scales = "free") + theme_bw() +
theme(axis.title = element_blank(),
legend.position = c(0.85,0.2))There seem to be outliers (to the left or right of the most common values)!
Estimations
Ok, so now let’s try to link what we have seen in the theoretical part with the data. This step is called estimation. Basically, if we think that the data follows \(x_t=bx_{t-1}+e_t\), we need to find \(b\)!
Returns
First, let’s have a look a autocorrelograms (of returns).
crypto_acf <-
coin_hist |>
na.omit() %>%
split(.$name) %>%
map(~acf(.$return, plot = F)) %>%
map_dfr(
~data.frame(lag = .$lag,
acf = .$acf,
ci = qnorm(0.975)/sqrt(.$n.used)
), .id = "name")
crypto_acf |>
filter(lag > 0, lag < 6) |>
ggplot(aes(x = lag, y = acf, fill = name)) +
geom_col(position = "dodge") + theme_bw()But recall that some of the coins are stable!
Overall, however, there does not seem to be a strong memory pattern. Hence, daily returns appear to be mostly white noise.
Realized volatility
Then, let’s have a look at volatility. It’s measured as
\[v_t=\sqrt{\frac{1}{T-1}\sum_{s=1}^T(r_{t-s}-\bar{r})^2},\] where \(\bar{r}\) is the mean return in the sample of \(T\) observations. Below, we use an optimized (C++-based function) to compute it via rolling standard deviation over 20 days.
library(RcppRoll)
coin_hist <- coin_hist |>
group_by(name) |>
na.omit() |>
mutate(real_vol_20 = roll_sd(return, n = 20, fill = NA, na.rm = T))coin_hist |>
filter(name == "Bitcoin") |>
pivot_longer(all_of(c("close", "real_vol_20")), names_to = "series", values_to = "value") |>
ggplot(aes(x = date, y = value, color = series)) + geom_line() + theme_bw() +
facet_wrap(vars(series), nrow = 2, scales = "free")# + scale_y_log10()Warning: Removed 19 rows containing missing values or values outside the scale range
(`geom_line()`).
Now let’s have a look a the autocorrelation of the volatility: it’s highly persistent.
crypto_acf <- coin_hist |>
na.omit() %>%
split(.$name) %>%
map(~acf(.$real_vol_20, plot = F)) %>%
map_dfr(
~data.frame(lag = .$lag,
acf = .$acf,
ci = qnorm(0.975)/sqrt(.$n.used)
), .id = "name")
crypto_acf |>
filter(lag > 0, lag < 15) |>
ggplot(aes(x = lag, y = acf, fill = name)) +
geom_col(position = "dodge", alpha = 0.8) + theme_bw() +
scale_fill_viridis_d(option = "magma", end = 0.9, begin = 0.1)The decay is slow, but there is heterogeneity between volatile coins (BTC, ETH) and stable coins (Tether, USD Coin).
GARCH estimation
There are many packages for GARCH estimation. We propose two: ruGARCH and fGARCH.
library(rugarch)
library(fGarch)In fGARCH, the model is:
\[s_t = \omega + \alpha e_{t-1} + \beta s_{t-1}.\] In the output, there are some additional parameters that allow to propose a more general model:
- shape is the shape parameter of the conditional distribution.
- skew is the skewness parameter of the conditional distribution.
- delta a numeric value, the exponent delta of the variance recursion.
garchFit(formula = ~garch(1,1),
data = coin_hist |>
filter(name == "Bitcoin") |>
pull(return))
Series Initialization:
ARMA Model: arma
Formula Mean: ~ arma(0, 0)
GARCH Model: garch
Formula Variance: ~ garch(1, 1)
ARMA Order: 0 0
Max ARMA Order: 0
GARCH Order: 1 1
Max GARCH Order: 1
Maximum Order: 1
Conditional Dist: norm
h.start: 2
llh.start: 1
Length of Series: 3599
Recursion Init: mci
Series Scale: 0.03616584
Parameter Initialization:
Initial Parameters: $params
Limits of Transformations: $U, $V
Which Parameters are Fixed? $includes
Parameter Matrix:
U V params includes
mu -0.54242878 0.5424288 0.05424288 TRUE
omega 0.00000100 100.0000000 0.10000000 TRUE
alpha1 0.00000001 1.0000000 0.10000000 TRUE
gamma1 -0.99999999 1.0000000 0.10000000 FALSE
beta1 0.00000001 1.0000000 0.80000000 TRUE
delta 0.00000000 2.0000000 2.00000000 FALSE
skew 0.10000000 10.0000000 1.00000000 FALSE
shape 1.00000000 10.0000000 4.00000000 FALSE
Index List of Parameters to be Optimized:
mu omega alpha1 beta1
1 2 3 5
Persistence: 0.9
--- START OF TRACE ---
Selected Algorithm: nlminb
R coded nlminb Solver:
0: 4901.4345: 0.0542429 0.100000 0.100000 0.800000
1: 4900.8875: 0.0542406 0.100241 0.102785 0.802279
2: 4900.4934: 0.0542376 0.0967681 0.103550 0.801681
3: 4899.8667: 0.0542282 0.0942551 0.108805 0.805932
4: 4899.5826: 0.0542052 0.0874419 0.108661 0.808260
5: 4899.1007: 0.0541662 0.0861723 0.106439 0.814966
6: 4898.9398: 0.0541060 0.0833008 0.102116 0.819852
7: 4898.9031: 0.0538701 0.0801999 0.102671 0.824669
8: 4898.8285: 0.0534942 0.0798192 0.100708 0.824800
9: 4898.8147: 0.0531302 0.0811762 0.0986961 0.825862
10: 4898.7632: 0.0527421 0.0806464 0.0991436 0.825272
11: 4898.7273: 0.0523551 0.0806526 0.100208 0.825137
12: 4898.6980: 0.0519643 0.0804215 0.100221 0.824860
13: 4898.4307: 0.0457054 0.0816254 0.0977542 0.825414
14: 4898.2126: 0.0394542 0.0771986 0.101900 0.827471
15: 4898.1952: 0.0394535 0.0778344 0.101600 0.827562
16: 4898.1826: 0.0394254 0.0778064 0.101391 0.827126
17: 4898.1741: 0.0393571 0.0783806 0.101475 0.826827
18: 4898.1678: 0.0392098 0.0785929 0.101648 0.826059
19: 4898.1623: 0.0389441 0.0777567 0.0999740 0.828222
20: 4898.1607: 0.0389436 0.0779510 0.0999344 0.828220
21: 4898.1594: 0.0389359 0.0779339 0.0998526 0.828108
22: 4898.1582: 0.0389159 0.0780777 0.0998470 0.828093
23: 4898.1570: 0.0388733 0.0780756 0.0998070 0.827985
24: 4898.1395: 0.0370046 0.0785011 0.100539 0.827062
25: 4898.1364: 0.0363306 0.0790836 0.100758 0.826188
26: 4898.1364: 0.0362860 0.0790228 0.100726 0.826273
27: 4898.1364: 0.0362954 0.0790275 0.100728 0.826267
Final Estimate of the Negative LLH:
LLH: -7049.248 norm LLH: -1.958669
mu omega alpha1 beta1
0.0013126537 0.0001033654 0.1007283915 0.8262667839
R-optimhess Difference Approximated Hessian Matrix:
mu omega alpha1 beta1
mu -3539164.249 27522486 -18158.17 9712.188
omega 27522486.443 -59759243800 -35528812.80 -59837318.494
alpha1 -18158.172 -35528813 -40979.40 -46777.394
beta1 9712.188 -59837318 -46777.39 -69578.250
attr(,"time")
Time difference of 0.01083612 secs
--- END OF TRACE ---
Time to Estimate Parameters:
Time difference of 0.06709695 secs
Title:
GARCH Modelling
Call:
garchFit(formula = ~garch(1, 1), data = pull(filter(coin_hist,
name == "Bitcoin"), return))
Mean and Variance Equation:
data ~ garch(1, 1)
<environment: 0x115f1c600>
[data = pull(filter(coin_hist, name == "Bitcoin"), return)]
Conditional Distribution:
norm
Coefficient(s):
mu omega alpha1 beta1
0.00131265 0.00010337 0.10072839 0.82626678
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu 1.313e-03 5.372e-04 2.444 0.0145 *
omega 1.034e-04 1.294e-05 7.985 1.33e-15 ***
alpha1 1.007e-01 1.211e-02 8.315 < 2e-16 ***
beta1 8.263e-01 1.730e-02 47.768 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Likelihood:
7049.248 normalized: 1.958669
Description:
Mon Nov 10 15:02:19 2025 by user:
Now, with rugarch…
spec <- ugarchspec()
ugarchfit(data = coin_hist |>
filter(name == "Bitcoin") |>
na.omit() |>
pull(return),
spec = spec)@fit$coefWarning in .sgarchfit(spec = spec, data = data, out.sample = out.sample, :
ugarchfit-->warning: solver failer to converge.
NULL
The results are not exactly the same! mu and omega are close…
We recall the model below \[s_t = \omega + \alpha e_{t-1} + \beta s_{t-1}.\] ar1 is the autocorrelation coefficient, hence \(\beta\) and ma1 is \(\alpha\).
Forecasting
Of course, when it comes to cryptos, one of the most interesting (and hardest) topic is prediction. To show how tough this is, we will rely on the fable package.
It’s highly integrated and performs many analyses automatically. Below, we use an ARIMA model for the forecast, see the reference for the parameters: https://fable.tidyverts.org/reference/ARIMA.html Beyond ARIMA, the methods available are ETS (exponential smoothing state space), TSLM (time-series linear model, \(y_t=a+bx_t\)), NNETAR (neural network autoregression, \(y_t=f(y_{t-1})\)).
Below, we leave the function pick the optimal orders via the function pdq(). But orders can be specified (see below).
library(fable)
library(tsibble)
library(feasts)
coin_hist |>
na.omit() |>
filter(name == "Ethereum") |>
distinct(time_close, .keep_all = TRUE) |>
tsibble(index = date) |>
model(
arima = ARIMA(return ~ 1 + pdq()),
snaive = SNAIVE(return)
) |>
forecast(h = 15) |>
autoplot() + theme_bw() + facet_grid(vars(.model))coin_hist |>
na.omit() |>
filter(name == "Ethereum") |>
distinct(time_close, .keep_all = TRUE) |>
tsibble(index = date) |>
model(
arima = ARIMA(close ~ 1 + pdq()),
snaive = SNAIVE(close)
) |>
forecast(h = 15) |>
autoplot() + theme_bw() + facet_grid(vars(.model))Let’s have a look at the estimated coefficients from models.
coin_hist |>
na.omit() |>
filter(symbol == "ETH") |>
distinct(time_close, .keep_all = TRUE) |>
tsibble(index = date) |>
model(
arima = ARIMA(return ~ 1 + pdq()),
snaive = SNAIVE(return)
) |>
coef()| .model | term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|---|
| arima | ar1 | -0.5354339 | 0.0149157 | -35.8972213 | 0.0000000 |
| arima | sar1 | -0.0260982 | 0.0176399 | -1.4794997 | 0.1391049 |
| arima | constant | 0.0000246 | 0.0010398 | 0.0236971 | 0.9810957 |
Note: sar is an auto-regressive component with seasonality.
We can specify a particular set of orders p, d and q.
coin_hist |>
na.omit() |>
filter(symbol == "ETH") |>
distinct(time_close, .keep_all = TRUE) |>
tsibble(index = date) |>
model(
arima = ARIMA(return ~ 1 + pdq(1,0,1)),
snaive = SNAIVE(return)
) |>
coef()| .model | term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|---|
| arima | ar1 | -0.7527147 | 0.1694571 | -4.441919 | 0.0000092 |
| arima | ma1 | 0.7250801 | 0.1765664 | 4.106557 | 0.0000412 |
| arima | constant | 0.0053468 | 0.0014831 | 3.605193 | 0.0003167 |
Prediction with LSTM (RNN)
First, because we are using an M1 Mac, we must resort to a trick to make tensorflow work. See the following post if you are also on a recent Mac.
We also build the time-series from which to learn.
library(reticulate)
use_condaenv("r-tensorflow") # This is for M1 Macs with special conda environment
library(TSLSTM)
data_coin <- coin_hist |>
distinct(symbol, time_close, .keep_all = TRUE) |>
select(timestamp, symbol, close, volume) |>
filter(symbol %in% c("BTC", "ETH")) |>
group_by(symbol) |>
mutate(date = as.Date(timestamp),
return = close / lag(close) - 1,
vol_change = volume / lag(volume) - 1) |>
ungroup() |>
select(date, symbol, return, vol_change) |>
pivot_wider(names_from = "symbol", values_from = c("return", "vol_change")) |>
mutate(return_ETH = lead(return_ETH)) |> # We forward the ETH return, as we want to predict it
na.omit()
train_coin <- data_coin |> filter(date < "2022-06-30")
test_coin <- data_coin |> filter(date > "2022-06-30")For technical reasons, we need Tensorflow to run on CPU and not GPU (it crashes otherwise on my mac!)
import tensorflow as tf
# Set the device to CPU
tf.config.set_visible_devices([], 'GPU')The idea is to predict ETH return with past BTC return, past ETH volume change and past BTC volume change.
library(keras3)
model_rnn <- keras_model_sequential(input_shape = c(nrow(train_coin), 3)) %>%
layer_gru(units = 9, # Nb units in hidden layer
name = "layer_1",
kernel_initializer = 'glorot_uniform',
bias_initializer = 'glorot_uniform',
#input_dim = c(1, nrow(train_coin), 3),
#batch_input_shape = c(1, nrow(train_coin), 3), # Dimensions = tricky part!
activation = 'tanh', # Activation function
return_sequences = TRUE) %>% # Return all the sequence
layer_dense(units = 1) # Final aggregation layer
model_rnn %>% compile(
loss = 'mean_squared_error', # Loss = quadratic
optimizer = optimizer_rmsprop(), # Backprop
metrics = c('mean_absolute_error') # Output metric MAE
)Onto training!
fit_rnn <- model_rnn %>%
fit(x = train_coin |> # Training features
select(return_BTC, vol_change_BTC, vol_change_ETH) |>
data.matrix() |>
array(dim = c(1, nrow(train_coin), 3)),
y = train_coin |> pull(return_ETH) |>
array(dim = c(1,nrow(train_coin))), # Training labels
epochs = 15, # Number of rounds
batch_size = 1 , # Length of sequences
verbose = 0) # No comments
plot(fit_rnn) + theme_bw() + geom_point(color = "black") + geom_line()Now, unfortunately, we cannot use the fitted model as such because the dimensions between the training and testing sets are not the same. We have to create a new model and import the old model’s weights…
new_model <- keras_model_sequential(input_shape = c(nrow(train_coin), 3)) %>%
layer_gru(units = 9,
#batch_input_shape = c(1, nrow(test_coin), 3), # New dimensions
kernel_initializer='glorot_uniform',
bias_initializer='glorot_uniform',
name = "layer_1",
activation = 'tanh', # Activation function
return_sequences = TRUE) %>% # Return the full sequence
layer_dense(units = 1) # Output dimension
new_model %>% keras3::set_weights(keras3::get_weights(model_rnn))We are now ready.
preds_rnn <- predict(new_model,
test_coin |>
select(return_BTC, vol_change_BTC, vol_change_ETH) |>
data.matrix() |>
array(dim = c(1, nrow(test_coin), 3)))1/1 - 0s - 109ms/step
mean(abs(preds_rnn - test_coin |> pull(return_ETH) ))[1] 0.1017152
Note: the result, because of the randomization of weights, is also random.
To be compared with the training metric…